Statistical Hypothesis Testing in Positive Unlabelled Data

نویسندگان

Konstantinos Sechidis

Borja Calvo

Gavin Brown

چکیده

We propose a set of novel methodologies which enable valid statistical hypothesis testing when we have only positive and unlabelled (PU) examples. This type of problem, a special case of semi-supervised data, is common in text mining, bioinformatics, and computer vision. Focusing on a generalised likelihood ratio test, we have 3 key contributions: (1) a proof that assuming all unlabelled examples are negative cases is sufficient for independence testing, but not for power analysis activities; (2) a new methodology that compensates this and enables power analysis, allowing sample size determination for observing an effect with a desired power; and finally, (3) a new capability, supervision determination, which can determine a-priori the number of labelled examples the user must collect before being able to observe a desired statistical effect. Beyond general hypothesis testing, we suggest the tools will additionally be useful for information theoretic feature selection, and Bayesian Network structure learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TESTING STATISTICAL HYPOTHESES UNDER FUZZY DATA AND BASED ON A NEW SIGNED DISTANCE

This paper deals with the problem of testing statisticalhypotheses when the available data are fuzzy. In this approach, wefirst obtain a fuzzy test statistic based on fuzzy data, and then,based on a new signed distance between fuzzy numbers, we introducea new decision rule to accept/reject the hypothesis of interest.The proposed approach is investigated for two cases: the casewithout nuisance p...

متن کامل

Testing the weak form of efficient market hypothesis in carbon efficient stock indices along with their benchmark indices in select countries

This paper presents the results of tests on the weak form of Efficient Market Hypothesis applied to carbon efficient stock market indices of India, the United States of America (USA), Japan, and Brazil and their corresponding market indices which are used as their benchmark indices. In this study, Kolmogrov-Smirnov and Shapiro-Wilk tests are used to test the normality of data. Run test and auto...

متن کامل

Testing for Stochastic Non- Linearity in the Rational Expectations Permanent Income Hypothesis

The Rational Expectations Permanent Income Hypothesis implies that consumption follows a martingale. However, most empirical tests have rejected the hypothesis. Those empirical tests are based on linear models. If the data generating process is non-linear, conventional tests may not assess some of the randomness properly. As a result, inference based on conventional tests of linear models can b...

متن کامل

False Discovery Rates

In hypothesis testing, statistical significance is typically based on calculations involving p-values and Type I error rates. A p-value calculated from a single statistical hypothesis test can be used to determine whether there is statistically significant evidence against the null hypothesis. The upper threshold applied to the p-value in making this determination (often 5% in the scientific li...

متن کامل

Quality estimation of multiple sequence alignments by Bayesian hypothesis testing

UNLABELLED In this work we present a web-based tool for estimating multiple alignment quality using Bayesian hypothesis testing. The proposed method is very simple, easily implemented and not time consuming with a linear complexity. We evaluated method against a series of different alignments (a set of random and biologically derived alignments) and compared the results with tools based on clas...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Statistical Hypothesis Testing in Positive Unlabelled Data

نویسندگان

چکیده

منابع مشابه

TESTING STATISTICAL HYPOTHESES UNDER FUZZY DATA AND BASED ON A NEW SIGNED DISTANCE

Testing the weak form of efficient market hypothesis in carbon efficient stock indices along with their benchmark indices in select countries

Testing for Stochastic Non- Linearity in the Rational Expectations Permanent Income Hypothesis

False Discovery Rates

Quality estimation of multiple sequence alignments by Bayesian hypothesis testing

عنوان ژورنال:

اشتراک گذاری